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Abstract. We apply multivariate Lagrange interpolation to synthesiz¬ 
ing polynomial quantitative loop invariants for probabilistic programs. 

We reduce the computation of a quantitative loop invariant to solving 
constraints over program variables and unknown coefficients. Lagrange 
interpolation allows us to find constraints with less unknown coeffi¬ 
cients. Counterexample-guided refinement furthermore generates linear 
constraints that pinpoint the desired quantitative invariants. We evalu¬ 
ate our technique by several case studies with polynomial quantitative 
loop invariants in the experiments. 

1 Introduction 

A probabilistic program may change its computation due to probabilistic choices. 
Consider, for instance, the Miller-Rabin algorithm for primality test [27]. Given 
a composite number, the algorithm reports incorrectly with probability at most 
0.25. Since the outcome of the algorithm is not always correct, classical program 
correctness specifications [9,14,20] do not apply. For probabilistic programs, 
quantitative specifications are needed to reason about program correctness [8,23, 
24]. Instead of logic formulae, probabilistic programs are specified by numerical 
functions over program variables. Since a probabilistic program gives random 
outcomes, a numerical function may have different values on different executions. 
The expected value of a numerical function is then determined by the probability 
distribution induced by the executions of program. 

Since probabilistic programs are specified by numerical functions, their cor¬ 
rectness can be established by annotations with expectations. In particular, cor¬ 
rectness of while loops can be proved by inferring special expectations called 
the quantitative loop invariants [24,25]. Similar to classical programs, finding 
general quantitative loop invariants is hard. Techniques for generating linear 
quantitative loop invariants however are available [1,15, 22, 25]. 

Interestingly, existing linear loop invariant generation techniques can be ex¬ 
tended to synthesize polynomial invariants [1]. Observe that polynomial mul¬ 
tivariate polynomials are linear combinations of monomials. For instance, any 
polynomial over x, y with degree 2 is a linear combination of the monomials 
and xy. It suffices to find coefhcients of the monomials to represent 
any multivariate polynomial of a fixed degree. Linear loop invariant generation 



techniques can hence be applied to infer invariants of a fixed degree. The number 
of monomials however grows rapidly. Quadratic polynomials over 5 variables, for 
example, are linear combinations of 21 monomials. One then has to find as many 
coefficients. It is unclear whether the extended approach is still feasible. 

In this paper, we develop a Lagrange interpolation-based technique to syn¬ 
thesize polynomial loop invariants for simple loops in probabilistic programs. 
Lagrange interpolation is a well-known method to construct explicit expres¬ 
sions for polynomials by sampling. For example, suppose that the values of f{x) 
are known to be si, S 3 , and S 4 at the sampling points 1, 3, and 4, respec¬ 
tively. By Lagrange interpolation, we immediately have an explicit expression 
of f{x) = Si • -k S3 • + S 4 • Our new technique 

employs multivariate Lagrange interpolation. Similar to previous techniques [22, 
15], we use conditions of quantitative loop invariants as constraints. Lagrange 
interpolation moreover allows us to simplify the constraints and sometimes to 
determine several coefficients. In the example, suppose /(3) = 1 is known. Then 
it suffices to determine si and S 4 to construct an explicit expression of /(x). 
In contrast, if f{x) is represented as cq + cix + C 2 X^, then /(3) = 1 only gives 
Co-k 3ci-k 9 c 2 = 1 and determines none of the coefficients. Lagrange interpolation 
hence can reduce the number of unknown coefficients and make our technique 
more scalable. 

Although there are less unknown coefficients, one still has to solve non-linear 
constraints. We give heuristics to determine coefficients efficiently. Our heuris¬ 
tics first perform random experiments and obtain linear constraints about co¬ 
efficients. An SMT solver is then used to find candidate coefficients from the 
constraints. If there is no candidate, then the desired loop invariant does not ex¬ 
ist. Otherwise, quantifier elimination verifies whether the candidate coefficients 
give a loop invariant. If so, our technique has found a quantitative loop invariant. 
Otherwise, we add more linear constraints to exclude infeasible coefficients. 

We apply our technique to find quantitative loop invariants for ten annotated 
loops from non-trivial probabilistic programs. Our case studies range from gam¬ 
bler’s ruin problem [13] to simulation of a fair coin with a biased coin [15]. Over 
1000 random runs, our technique is able to synthesize polynomial quantitative 
loop invariants within 15 seconds on average. Besides, 97.5% of the runs can 
finish within a 300-second timeout. 

Related Work. Constraint-based techniques for automated loop invariants 
generation have been much progressed over the past years [4, 5,17,18, 21, 22, 29]. 
Gupta and Rybalchenko [18,19] proposed a CEGAR framework, so that static 
and dynamic information of a program can be exploited incrementally to restrict 
the search space of qualitative loop invariants. Sankaranarayanan et al. [29] used 
Grobner bases to reduce the generation of algebraic polynomial loop invariants 
to solving non-linear constraints in the parametric linear form. These techniques 
however deal with classical programs and cannot be applied to probabilistic pro¬ 
grams directly. Mclver and Morgan [24] were among the first to consider quan¬ 
titative loop invariants for probabilistic programs. Katoen et al. [22] studied the 
synthesis of quantitative loop invariants using a constraint-solving approach. 
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The approach was further developed and implemented in the Prinsys tool [15], 
which synthesizes quantitative invariants by solving constraints over unknown 
template coefficients. The performance of the tool however is sensitive to manu¬ 
ally supplied templates. Recently, a technique based on abstract interpretation 
is proposed in [Ij. It formulates linear loop invariants with the collecting se¬ 
mantics and synthesizes coefficients via fixed-point computation. Although the 
authors only report experiments on linear loop invariants, the technique can be 
extended to generate polynomial invariants by representing polynomials as lin¬ 
ear combinations of monomials. The effectiveness of the extension however is 
unclear. 

We have the following organization. After preliminaries, we review probabilis¬ 
tic programs in Section 3. Quantitative loop invariants are presented in Section 4. 
Section 5 introduces multivariate Lagrange interpolation. Our technical contri¬ 
bution is presented in Section 6. Applications are given in Section 7. We evaluate 
our technique in the following section. Section 9 concludes our presentation. 

2 Preliminaries 

Let Xm be a sequence of variables xi,a: 2 , • ■ • ,Xm- We use IR[xJ^] to denote the 
set of real coefficient polynomials over m variables of degree at most n. Observe 
that IR[x”] can be seen as a vector space over M of dimension d = For 

instance, the set of d monomials ■ ■ ■ x^ : 0 < di + d 2 + ■••+ dm < n} 

forms a basis of IR[x]]:j]. Given / € K[x]]j] and expressions ei, 62 ,..., e™, we use 
/(ci, 62 ,..., Cm) to denote the polynomial obtained by replacing Xi with for 
1 < i < m in /. Particularly, /(v) is the value of / at v € M™. 

A constraint is a quantifier-free logic formula with equality, linear order, 
addition, multiplication, division, and integer constants. A constraint is linear 
if it contains only linear expressions; otherwise, it is non-linear. A quantified 
eonstraint is a constraint with quantifiers over its variables. A valuation over 
Xm assigns a value to each variable in Xm- A model of a constraint is a valuation 
which evaluates the constraint to true. 

Given a quantified constraint, quantifier elimination removes quantifiers and 
returns a logically equivalent constraint. Given a linear constraint, a Satisfiability 
Modulo Theory (SMT) solver returns a model of the constraint if it exists. 


3 Probabilistic Programs 

A probabilistic program in the probabilistic guarded command language is of the 
following form: 

P ::= skip I abort \x := E\ P;P \ P[p]P \ if (G) then {P} else {P} \ while (G) {P} 

where E is an expression and G is a Boolean expression. For p € (0,1), the prob¬ 
abilistic choice command Po[p]Pi executes Pq with probability p and Pi with 
probability I — p. For instance, x := 1 [0.75] x := 0 sets x to 1 with probability 
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0.75 and to 0 with probability 0.25. A program state is a valuation over pro¬ 
gram variables. For simplicity, we assume program variables are in non-negative 
integers, and use 0 and 1 for the truth values false and true respectively. 


Example 1. Consider the following probabilistic program: 

2 ; := 0; while (0 < a: < j/) { a; := x -|- 1 [0.5] x := x — 1] z := z + 1; } 

The program models a game where a player possesses x dollars and keeps tossing 
a fair coin. The player wins one dollar for each head and loses one dollar for each 
tail. The game ends either when the player loses all his capital, or when the 
amount of capital reaches y dollars for a predetermined y > x. The variable z 
counts the number of tosses made during the game. 


3.1 Expectations 

From an initial program state, a probabilistic program can have different final 
program states due to probabilistic choice commands. Particularly, a function 
over program variables gives different values on different final program states. 
Note that a probabilistic program induces a probability distribution on final 
program states. One therefore can discuss the expected value of any function 
over program variables with respect to that probability distribution. More pre¬ 
cisely, one can take an expectation transformer [24] approach to characterize a 
probabilistic program by annotating the program with expectations. 

Formally, an expectation is a function mapping program states to a non¬ 
negative real number. An expectation is called a post-expectation when it is to 
be evaluated on final program states. Similarly, an expectation is called a pre¬ 
expectation if it is to be evaluated on initial program states. Let prreE and postE 
be expectations, and prog a probabilistic program. We say a quantitative Hoare 
triple (preE) prog {postE) holds if the expected value of postE is no less than 
that of preE before executing prog. Note that the expected values of postE and 
preE are functions over states and hence are compared pointwisely. 

For any Boolean expression G, define the indicator function [G] = 1 if G is 
true and [G] = 0 otherwise. Consider an qualitative Hoare triple {P} prog {Q} 
with a pre-condition P, a post-condition Q, and a classical program prog. Ob¬ 
serve that {P} prog {Q} holds if and only if ([P]) prog ([Q]) holds. Expectations 
are therefore the quantitative analogue to predicates for classical programs. 


3.2 Expectation Transformer for Probabilistic Programs 

Let P and Q be probabilistic programs, g a post-expectation, x a program vari¬ 
able, E an expression, G a Boolean expression, and p G (0,1). Define the expec- 
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tation transformer wp{- ,g) as follows [24]. 

wp{sk\p,g) = g 
wp(abort, g) = 0 
wp{x := E,g) = g[x/E] 

wp{P]Q,g) = wp{P,wp{Q,g)) 

wp{\f (G) then {P} else {Q}, g) = [G] • wp{P, g) + [^G] • wp{Q, g) 
wp{P[p]Q,g) =p-wp{P,g) + {1 - p) ■ wp{Q,g) 
■u;p(while (G) {P},g) = gX.{[G] ■ wp{P,X) + [^G] • g). 

Here g[x/E] denotes the formula obtained from g by replacing free occurrences 
of X by E. The least hxed point operator p is defined over the domain of expec¬ 
tations [16]. It can be shown that (/) P (g) if and only if / < wp{P,g). That is, 
wp{P,g) is the greatest lower bound of pre-expectation of P with respect to g. 
We say wp{P, g) is the weakest pre-expectation of P with respect to g. 

Example 2. The weakest pre-expectation of command a; := a; -I- 1 [p] a: := a; — 1 
with respect to x is computed below: 

wp(x := a: -|- 1 [p] a; := X — 1, a:) 

= p • wp(x := X + 1, x) + (1 — p) • wp{x := x — 1, x) 

= p ■ {x+ 1) + {I - p) ■ {x - 1) 

= X + 2p — 1 

It follows that (x -I- 2p — 1) X := X -I- 1 [p] X := X — 1 (x) holds. 

4 Quantitative Loop Invariants 

Given a pre-expectation preE, a post-expectation postE, a Boolean expression 
G, and a loop-free probabilistic program body, we would like to verify whether 

(preE) while (G) {body} {postE) 

holds or not. One way to solve this problem is to compute the weakest pre¬ 
expectation wp{\N\\\\e{G) {body},postE) and check if it is not less than preE 
pointwisely. However, the weakest pre-expectation of a while-command requires 
fixed point computation. To avoid the expensive computation, we can solve the 
problem by finding quantitative loop invariants. 

Theorem 1 ([15,24]). Let preE be a pre-expectation, postE a post-expectation, 
G a Boolean expression, and body a loop-free probabilistic program. To show 

(preE) while (G) {body} {postE), 

it suffices to find a loop invariant I which is an expectation such that 
1. (boundary) preE < I and I • [^G] < postE; 
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2. (invariant) I ■ [G] < wp{body, I); 

3. (soundness) the loop terminates from any state in G with probability 1, and 

(a) the number of iterations is finite; 

(b) I is bounded above by some fixed constant; or 

(c) the expected value of I ■ [G] tends to zero as the loop continues to iterate. 

In this paper, we only focus on checking the boundary and invariant condi¬ 
tions in Theorem 1. One however can show that any polynomial expectation is 
sound for all examples we consider. In fact, one can establish the soundness of a 
large class of loop invariants before any specific invariant is found. For instance, 
it can be shown that any polynomial expectation satisfies the third soundness 
condition, as long as the probability of exiting the loop is bounded below by a 
non-zero constant in each iteration. We refer the reader to [24] for more details 
of sufficient conditions for soundness. 


5 Multivariate Lagrange Interpolation 


Fix a degree n of quantitative loop invariants and number of variables m. Let 
d = (’”[!'")■ Multivariate Lagrange interpolation is a method to construct an 
explicit expression for any polynomial in ^[x]]:^] by sampling, see e.g., [ 6 , 26, 30]. 
Given d sampling points Si, S 2 ,..., G M™, we can compute a Lagrange basis 
as follows [28]. Let { 6 i, 62 , • ■ •, bd} = {x'(^X 2 ^ ■ ■ ■ : di + d 2 + ••• + dm If: n} 

be the set of monomials in IR[x]]j]. For I < z < d, define 


'di(si) • • • bd{si)' 


Mi = det 


bi 


bd 


<— the zth row 


biisd) ■ ■ ■ bd{sd) 


Observe that Mi G IR[xJ]j] for 1 < z < d. Moreover, Mi{sj) = 0 for z 7^ j and 
Mi(si) = M2 (s2) = ••• = Md(sd) = r for some r G ffi. If r = 0 , then there 
is a geometrical dependency among the sampling points Si, S2,..., [ 2 ], and 

thus no Lagrange basis could be determined from these points. If r 7^ 0 , define 
Bi = Mi/r for I < z < d. Then ,B(si, S2,..., = {Bi : I < z < d} C ffi[x 5 ]^] is 

called a Lagrange basis of IR[x]]:j]. 

Observe that Bi{sj) = [z = j] for 1 < z, j < d. Thus ) = /(®i) 

for I < J < d. Moreover, given any / G K[x” ], we can write / = 

Define the Lagrange functional £[si, S 2 ,..., s^;] : —)• ffi[xJ]j] by 


d 

G [Si, S2, . . . , S(;^] (ci, C2 , . . . , Cd)) ^ ^ CiBi . 

i=l 

Then / = /:[si, S2,..., Sd](/(si),/(S2),...,/(s^)) for any / G M[x”]. We shall 
call /(si), /(S2), ..., fisd) G M the coefficients for / on basis S(si, S2,..., s^). 
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6 Interpolation of Loop Invariants 

Suppose we would like to find a quantitative loop invariant I € IR[x^] for 
(preE) while (G) {body} {postE) 

where preE is a pre-expectation, postE is a post-expectation, G is a Boolean 
expression, and body is a loop-free probabilistic program. Assume the soundness 
of / can be verified. We shall use Lagrange interpolation to find /. 

Let Si, S 2 ,..., Sd e ffi™ be sampling points that determine a Lagrange basis. 
If the coefficients /(si),/(S 2 ),... ,I{sd) € K are known, then 

/ = /:[Si, S2, .... Sd](7(Si), J(S2), ..., /(Sd)) 

by Lagrange interpolation. Our idea therefore is to find the coefficients via 
constraint-solving. By the boundary and invariant conditions in Theorem 1, we 
have the following requirements about any loop invariant I: 

preE < I 

I ■ [^G] < postE (1) 

/ • [G] < wp{body, I). 

Example 3. Consider 

{xy — x^) z := 0; while (0 < a; < j/) { a; := a; -|- 1 [0.5] a: := a: — 1; z := z + 1; } (z). 

The following must hold for any loop invariant / 

xy — x^ < I 
I -[x<0\/y<x]<z 

I ■ [0 < X < y] < 0.5 ■ I{x + l,y, z + 1) + 0.5 ■ I{x — l,y, z + 1). 

Observe that wp{x := a; -|- 1 [0.5] x := x — 1; z := z + 1,1{x,y, z)) = wp(x := 
x+\ [0.5] X := x—\,I{x, y, ^-1-1)) = 0.5-wp{x := x+l,I{x, y, z+l))+0.5-wp(x := 
X — I, I{x,y,z + 1)) = 0.5 ■ I{x + I, y,z + 1) + 0.5 ■ I{x — I, y,z + 1). 

Requirements (1) can have indicators on both sides of inequality, which is 
beyond the capability of the solvers we use. We would like to obtain a constraint 
by removing indicators in two steps. First, we rewrite the expectations to a 
normal form. An expectation is in disjoint normal form (DNF) if it is of the 
form / = [Pi] • /i -I- • • • -h [Pfe] • fk, where Pi,P 2 ,... ,Pk are disjoint, that is, at 
most one of Pi, P 2 ,..., P^ evaluates to true on any valuation. 

Theorem 2 ([22]). Given an expectation of the form f = [Pi]-/i-l-l-[7^fc]-/fc, 

/ is equivalent to the following expectation in DNF: 

H A ■Y.f- 

ICK [_ \iei / \j&K\i J j iei 
where K = {1,2,... ,k}. 
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We then transform inequalities between expectations in DNF to constraints. 


Theorem 3 ([15]). Suppose / = [P^] • /^ +-h [Pk] ■ fk and g = [Qi] ■ gi + 

• • • + [Qh] • gh are expectations over in DNF. f < g iff for every 

A A {{Pj ^ Qi) ^ fj — 9i) ^ 

j£Ki£H 

A f f A ^Qi ^ ^ /i < 0^ A A ( ( A ^Pj A <3*) ^ 0 < 

JGK WieH / J ieH \\jeK J 

where iF = {1, 2,..., /c} and H = {1, 2,..., h}. 

Example 4- By Theorem 2 and 3, requirements in Example 3 are equivalent to 

xy — x'^ < I A 
{x<0\/y<x)^I<z A 
(a:<0V?/<x)^0<z A 

{0 < X < y) => I < 0.5 • I{x + l,y, z + 1) + 0.5 ■ I{x — l,y, z + 1) A 
(0 < a; < J/) ^ 0 < 0.5 ■ l\x + l,y, z + t) + Q.b ■ I{x — l,y, z F) 

for every x, y, z. 

We define the loop invariant constraint ())[si, S 2 ,..., Sc;](ci, C 2 ,..., c^) as the 
constraint transformed from the requirements (1), where the quantitative loop 
invariant / is replaced by Lagrange functional £[si, S 2 ,..., Sc;](ci, C 2 ,..., Cd). 

Example 5. We have the following loop invariant constraint from Example 4. 

S 2 , . . . , Sio](ci, C 2 , . . . , Cio) = 
xy-x"^ < /:[si,S 2 ,...,Sio](ci,C 2 ,...,Cio) A 
(cc < 0 V y < x) ^ >C[si,S 2 ,... ,Sio](ci,C 2 ,..., cio) < 2 : A 
(0 < a; < y) ^ 2 • £[si,S 2 , ..., Sio](ci, C 2 ,... ,cio) < 

£[si, S 2 ,..., Sio](ci, C 2 ,..., cio)(a; + 1, y, z + 1) + 
£[si, S 2 ,..., Sio](ci, C 2 ,..., Cio)(a; — 1, y, z + 1). 

With loop invariant constraints, it is easy to state our goal. Observe that 
3si, S 2 ,..., Sd 3ci, C 2 ,..., Cd Vx„. ())[si, S 2 ,..., Sd](ci, C 2 ,..., Cd) implies the exis¬ 
tence of a quantitative loop invariant satisfying the boundary and invariant con¬ 
ditions in Theorem 1. Our strategy hence is to choose sampling points Si, S 2 ,..., Sd 
such that 3ci,C2,...,CdVx^.(^[si,S2 ,...,Sd](ci,C2,...,Cd) holds. 

We will choose sampling points to simplify the loop invariant constraint. 
Recall that sampling points are not unique in Lagrange interpolation. Eor a 
loop invariant constraint, we select sampling points so that several coefficients 
among ci, C 2 ,..., Cd are determined. This helps us to evaluate the quantified loop 
invariant constraint 3ci, C 2 ,..., Cd Vx^. </)[si, S 2 ,..., Sd](ci, C 2 ,..., Cd). 

To evaluate the quantified loop invariant constraint, observe that the La¬ 
grange functional £[si, S 2 ,..., Sd](ci, C 2 ,..., Cd) is a multivariate polynomial over 



Cl, C 2 , ■ ■ ■, Cd and x^. A loop invariant constraint is hence non-linear. However, 
^[si, S2,..., S£i](ci, C2,..., Cd)(e) is a linear constraint over coefficients for every 
experiment e G Z™, i.e., valuation over x^. We therefore use experiments to 
construct a series of linear constraints and hnd coefficients by an SMT solver. 


Input: (preE) while (G) {body} (postE) : a loop over program variables x™; n : 
the degree of an loop invariant 

Output: I : a loop invariant satisfying the boundary and invariant conditions 
in Theorem 1 

rf^rr); 

Si, S2,..., Sti SamplingPointsO; 

C <r- InitialConstraint(si,S2,...,s^j); 

while C has a model do 

Cl, C2,..., Cd a model of C from an SMT solver; 

switch RandomExperiments{C) do 
case Pass: 

switch UQElem{yim., 0[si,S2,... ,Sd](ci, C2,..., Cd)) do 
case True-, return T [si, S2,..., Sd] (ci, 62,..., Cd) ; 
case Counter Example (e) : RefineConstraint(G, e) ; 

endsw 

case Counter Example (e) : RefineConstraint(G, e) ; 

endsw 

end 

//No loop invariant 

Algorithm 1 : Quantitative loop invariant synthesis 

Algorithm 1 shows our top-level algorithm. The algorithm starts by choosing 
sampling points (Section 6.1). The sampling points are then used to construct the 
initial linear constraint over coefficients (Section 6.2). The while loop evaluates 
the quantified loop invariant constraint 3ci, C 2 ,..., Cd Vx^. S 2 ,..., Sd](ci, C 2 , 
...,Cd). In each iteration, the algorithm selects coefficients ci,C 2 ,...,Cd by a 
model of the linear constraint obtained from an SMT solver. It then checks 
whether Vx^. ^[si, S 2 ,..., Sd](ci, 62 ,..., Cd) is true. The algorithm does this by 
first trying a number of random experiments (Section 6.3). Only after the ran¬ 
dom experiments are passed, will the algorithm performs universal quantiher 
elimination to evaluate the quantified constraint (Section 6.4). If the random ex¬ 
periments fail, or quantifier elimination does not evaluate to true, our algorithm 
refines the linear constraint by a counterexample and reiterates (Section 6.5). 


6.1 Choosing Sampling Points 

In Lagrange interpolation, sampling points need be chosen in the first place. We 
would like to choose sampling points so that as many coefficients are determined 
as possible. To this end, observe that £[si, S 2 ,..., Sd](ci, C 2 , ..., Cd)(si) = for 
1 < f < d. In other words, ^[si, S 2 ,..., Sd](ci, C 2 ,..., Cd)(si) can be signfficantly 
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simplified if a sampling point is used as an experiment. Consider, for instance, 
the boundary condition in our running example: 

xy-x"^ < £[si,S2,...,Sd](ci,C2,...,Cd)(a;,7/, 2;); and 

(x < 0 V j/ < a;) ^ £[81,82 ,... ,Sd](ci,C2,.. .,Cd){x,y,z) < z 

If Sj = (0,3,0) is a sampling point, then the condition can be simplified to 0 < cj 
and Cj < 0. Thus cj is determined by choosing (0,3,0) as both a sampling point 
and an experiment. 

Ideally, one would choose sampling points so that all coefhcients are deter¬ 
mined. Unfortunately, such points tend to be geometrically dependent in prac¬ 
tice. Thus we cannot expect to establish a Lagrange basis from these points 
exclusively. Instead, we try to hnd sampling points which yield a Lagrange basis 
and determine as many coefficients as possible. We adopt a weighted random 
search for this purpose. That is, we pick sampling points randomly according 
to their weights, so that points determining more coefhcients are more likely 
to be picked. If the randomly selected sampling points fail to yield a Lagrange 
basis, we discard them and select other sampling points randomly again. In our 
experiments, this heuristic finds pretty good sampling points in reasonable time. 

6.2 Initial Constraint 

After sampling points are chosen, we compute the initial linear constraint over 
coefhcients. Recall that £[si, 82 ,..., Sd](ci, C2 ,..., Cd)(si) = Ci for I < f < d. 
By taking sampling points as experiments, the loop invariant constraint (/'[si, 82, 

..., S£i](ci, C2 ,..., Cd) is simplihed to a linear constraint over ci, C2 ,..., c^. 

Example 6. Consider the loop invariant constraint in Example 5. We hrst choose 
10 sampling points Si,..., Sio (see table below) to establish a Lagrange basis. We 
then compute the initial constraints by simplifying the loop invariant constraint 
with the sampling points. For example, we obtain constraint C2 = 0 from point 
82 = (2, 2, 0) as follows: 

(/)[si, 82 ,..., Sio](ci, C2 ,..., cio)(2, 2,0) 

iff (2 • (2 - 2) < C2) A ((2 < 0 V 2 < 2) ^ C2 < 0) A 

(0 ^ 2 < 2 0 ^ —2ci — 42 c 2 -t- 3c3 -t- 27c4 -t- 9c5 -t- Qcq -t- 14c7 — 12c8 — 3cio) 

iff 0 < C2 A C2 < 0 iff C2 = 0. 

We list all initial constraints in the following table, where (/)[si,S2 ,... ,Sio](ci, 
C2 ,..., cio)(si) is denoted by for simplicity. 


i 

S* 


i 

Si 

V'(Si) 

i 

Si 


1 

0,3,0 

Cl = 0 

2 

2,2,0 

C 2 = 0 

3 

0,3,1 

0 < C 3 < 1 

4 

1,1,0 

C 4 = 0 

5 

1,1,2 

0 < C 5 < 2 

6 

2,2,1 

0 < C 6 < 1 

7 

3,3,0 

C 7 = 0 

8 

0,0,1 

0 < cg < 1 

9 

0,1,0 

Cg = 0 

10 

2,3,3 

6 < 3cio < —4ci — 36 c 2 -I- Sca -1- 30c4 -1- 12c5 — Gce -1- I 6 C 7 — 14c8 
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Note that our choice of sampling points helps the initial constraints determine 
5 coefficients. If a standard monomial basis were used, none of the coefficients 
could be determined by the initial constraints. 

6.3 Random Experiments 

From a linear constraint of coefficients, we obtain a model Ci,C2 ,... ,Cd of the 
linear constraint from an SMT solver. Recall that we would like to check if 
Vxm- S2 ,..., Sd](ci, C2 ,..., Cd) is true. Before using expensive quantifier elim¬ 
ination immediately, we first perform a number of random tests. If (/'[si, S2 ,..., s^] 
(ci, C2, • ■ •, Cd){e) evaluates to true for all random experiments e G Z™, the coeffi¬ 
cients Cl, C2, • • •, Cd may induce a loop invariant. Otherwise, a witness experiment 
e is used to refine the linear constraint over coefficients. 

When the coefficients do not induce a loop invariant, the random experiments 
make it possible to avoid expensive quantifier elimination and to obtain a witness 
experiment without resorting to an SMT solver. This possibility is important, 
because the solver we use does not always find a valid witness experiment. 

6.4 Universal Quantifier Elimination 

After random tests, we perform quantifier elimination check if Vx^. 4>[si, S2 ,..., 
Sd](ci, C2, ■ ■■ ,Cd) is true. If so, the polynomial £[si,S2 ,... ,8^] (ci,C2, ■■■,Cd) is 
a quantitative loop invariant satisfying the boundary and invariant conditions. 
Otherwise, we obtain a witness experiment to refine our linear constraint. 

Universal quantifier elimination is carried out in two steps. We first eliminate 
the quantifiers in the ordered field theory [3,11]. Intuitively, the ordered field 
theory formalizes real numbers M. Since quantifier elimination tools such as 
Redlog [10] employ algebra and real algebraic geometry, eliminating quantifiers 
over real numbers is more efficient than over integers. If Vx^. (/>[si, S2 ,... ,S£i] 
(ci, C2 ,..., Cd) is true over M, it is also true over Z. Thus ci, 62 ,..., Cd induces 
a quantitative loop invariant. Otherwise, we perform quantifier elimination over 
Z. 

If Vx^. (/)[si, S2 ,..., S(i](ci, C2 ,..., Cd) evaluates to true over Z, we are done. 
Otherwise, quantifier elimination gives a constraint equivalent to the quantified 
query. We then use an SMT solver to obtain a witness experiment. We abort the 
procedure if the solver times-out or fails to yield a valid witness experiment. 

6.5 Constraint Refinement 

Let e = {xi,X 2 ,. •., Xm) G be a witness experiment such that ^f’[si, S2 ,..., s^] 
(ci, 62 ,..., Cd)(e) evaluates to false. Recall that we would like to find coefficients 
Cl, C2 ,..., Cd such that (/)[si, S2 ,..., Sd](ci, C2 ,..., Cd) is true for every valuations 
over Xm- Particularly, ())[si, S2 ,..., Sd](ci, C2 ,..., Cd)(a;i, X2) • • ■, Xm) must also be 
true for such coefficients. Note that ^[si, S2 ,..., Sd](ci, C2 ,..., Cd){xi,X 2 , ■ •., Xm) 
is a linear constraint on coefficients ci,C2 ,...,Cd that excludes the incorrect 
coefficients Ci, £2 ,..., c^. By adding the linear constraint to the current set of 
constraints, our algorithm will find different coefficients in the next iteration. 
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7 Applications 


Name 

preE 

postE 

Time 

O 

1- 

L 

T 

S 

#L 

#T 

#s 

ruin 

xy — x'^ 

z 

3.6s 

0% 

0.3s 

2.8s 

0.3s 

5.2 

61.5 

5.0 

geol 

X -\- 3zy 

X 

3.0s 

0% 

1.4s 

1.5s 

0.1s 

21.4 

32.1 

1.0 

geo2 


X 

8.0s 

0% 

1.4s 

4.9s 

0.2s 

22.3 

108 

3.8 

binl 

X + \ny 

X 

4.5s 

0% 

1.4s 

2.9s 

0.1s 

22.7 

64.0 

1.0 

bin2 

- |n-b \ny 

X 

77.5s 

19% 

0.4s 

9.0s 

15.8s 

5.9 

185 

10.3 

sum 


X 

2.5s 

0% 

0.1s 

1.9s 

0.4s 

1.2 

42.7 

5.8 

prod 

In'" - \n 

xy 

15.7s 

5% 

0.3s 

4.3s 

2.3s 

4.2 

97.0 

6.5 

coinl 

2 2-^ 

1 — X + xy 

2.0s 

0% 

0.4s 

1.2s 

0.3s 

5.7 

27.1 

3.6 

coin2 

k-kv 

X + xy 

3.9s 

0% 

0.6s 

1.4s 

1.8s 

9.8 

32.0 

4.2 

coins 

1 - |a; - fy-t in 

n 

18.5s 

1% 

12.6s 

2.6s 

1.4s 

202 

57.9 

7.2 


Table 1. Summary of results. The name of each experiment is shown in column Name. 
The annotated pre- and post-expectations are shown in columns preE and postE, re¬ 
spectively. Column Time lists the mean execution times our prototype took to verify the 
annotations, and TO denotes the timeout ratios. Besides, columns L, T, and S show the 
average times our prototype spent in sampling a Lagrange basis, making random tests 
and synthesizing coefficients, respectively. Finally, columns #L, ifT, and show the 
average numbers of iterations onr prototype has taken to find sampling points, make 
random tests, and refine constraints, respectively. The last six colnmns are calculated 
based on the rnns that finished within timeouts. 


We have implemented a prototype in JavaScript to test our techniques. For each 
simple loop, we manually perform the weakest pre-expectation computation and 
the DNF transformation to translate the requirements (1) into loop invariant 
constraints. We then use our prototype to find a quantitative loop invariant based 
on the constraints. Our prototype uses GNU Octave [12] to compute Lagrange 
interpolation, Z3 [7] to solve linear constraints, and Redlog [10] to perform 
quantifier elimination. The experiments are done on an Intel Xeon 3.07GHz 
Linux workstation with 16GB RAM. 

We consider six types of applications: gambler’s ruin problem, geometric 
distribution, binomial distribution, sum of random series, product of random 
variables, and simulation of a fair coin. We also consider variants of geometric and 
binomial distributions. For the fair-coin simulation, we find three quantitative 
loop invariants to prove the correctness and the expected execution time of the 
simulation. In each probabilistic program, we annotate the while loop with a pre¬ 
expectation and a post-expectation. Note that the annotated pre-expectation is 
by construction a precise estimate of the annotated post-expectation. 

Our results are summarized in Table 1. We use a fixed random seed for all ex¬ 
periments and calculate the averages over 1000 runs with a 300-second timeout. 
Due to random sampling, the prototype may synthesize different loop invariants 
in different runs of the same experiment. We now discuss the applications in 
more details. 
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Gambler’s Ruin Problem. In Example 1, we consider a game where a player 
has X dollars initially and plays until he loses all his money or wins up to y — a; 
dollars for some y > x. The expected number of rounds before the game ends is 
E[z\ = x-{y — x). Our prototype proves this result within 3.6 seconds on average. 

Geometric Distribution. The geometric distribution describes the number of 
tails before the first head in a sequence of coin-tossing. When the probability of 
head is 0.25, we expect to see = 3 tails before the first head. The following 

program computes a geometrically distributed random variable x: 

X :=0; z := 1; while (z ^ 0) { z '■= 0 [0.25] x := x + y; } 

Our prototype finds a quantitative loop invariant for the pre-expectation E[x\ = 
3y within 3 seconds on average. 

We moreover consider the following variant of the game. A player keeps 
flipping a coin until the head turns up. He wins k dollars if the tail turns up at 
the fcth flip. The variant is modeled as follows. 

a; := 0; y := 0; z := 1; while [z ^ 0) { y := y + 1; z '■= 0 [0.25] x := x + y; } 

The expected amount of money a player can win is E[x] = ^ (0.25“^ ~ l) = T- 
Our prototype proves this result within 8 seconds on average. 

Binomial Distribution. The binomials distribution describes the number of 
heads that appear in a fixed number of coin-tossing. If the probability of head is 
0.25 and the number of tosses is n, then the expected number of heads is 0.25n. 
The following program computes a binomially distributed random variable x: 

X := 0; while {0 < n) { x := x + y [0.25] skip; n := n — 1; } 

Our prototype proves E[x\ = 0.25nj/ within 4.5 seconds on average. We moreover 
consider the following variant. A player flips a coin for n times. At the fcth flip, 
he wins k dollars if the head turns up and wins y dollars otherwise. This game 
can be modeled as follows. 

X := 0; while {0 < n) { x '■= x + n [0.25] x '■= x + y; n '■= n — 1; } 

The expected amount of money a player can win is E[x] = 0.25 • |n(n -|- 1) -(- 
(1 — 0.25) • ny = ~ + 1^2/- Our prototype proves this result within 77.5 

seconds on average. 

Sum of Random Series. Consider a game where a player flips a coin for n 
times. The player wins k dollars if the head turns up at the /cth flip. The following 
program models this game when the head probability of the coin is 0.5: 

X := 0; while {0 < n) { x := x + n [0.5] skip; n := n — 1; } 
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The expected amount of money the player can win from this game is E[x\ = 
0.5 • * = 0.5 • \n{n + 1) dollars. Our prototype proves this result within 2.5 

seconds on average. 

Product of Dependent Random Variables. We consider a game where two 
players flip a coin for n times. The first player wins one dollar for each head 
and the second player wins one dollars for each tail. When the head probability 
of the coin is 0.5, this game can be modeled by the following program where 
variables x, y represent the amount of money won by the respective players: 

while (0 < n) { X := X + 1 [0.5] y := y + 1; n := n — 1; } 

It can be shown that E[xy\ = — n). Our prototype proves this result within 

15.7 seconds on average. 

Simulation of a Fair Coin. We consider an algorithm that simulates a fair 
coin flip using biased coins [15]: 

X := 0; y := 0; n := 0; 

while (x = y) { X := 1 [0.25] x := 0; y := 1 [0.25] y := 0; n := n + 1; } 

The algorithm uses two biased coins x and y with head probability 0.25. The 
main loop flips the two coins at each iteration and terminates when the coins 
show different outcomes. The value of x is then taken as the final outcome, with 
1 representing the head and 0 representing the tail. 

To show that the algorithm indeed simulates a fair coin flip, we prove 

0.5 — 0.5x < wp{loop, 1 — X + xy) and 0.5 — 0.5y < wp{loop, x + xy), 

where loop denotes the while loop in the program. Since x = y = 0 before the loop 
starts and xy = 0 after the loop stops, we see that 0.5 < E[1 — x] and 0.5 < ^^[x] 
on termination. Since x € {0, 1}, it follows that Pr{x = 1} = Pr{x = 0} = 0.5 
on termination, and thus the correctness of the algorithm is concluded. 

Observe moreover that the number of iterations made before termination 
obeys a geometric distribution with head probability 0.25 • 2(1 — 0.25) = 0.375. 
Hence, the expected number of iterations is E[n] = + 1 = |. This result 

is verified by our prototype within 18.5 seconds on average. 

8 Evaluation 

Our technique is closely related to the Prinsys tool [15], which implements 
the constraint-based quantitative invariant synthesis approach developed in [22]. 
Prinsys receives a probabilistic program and a template with unknown coef¬ 
ficients. It derives loop invariant constraints from the template and exploits 
SMT-solvers to perform quantifier elimination and simplification for the con¬ 
straints. The tool generates a formula, which is in effect a conjunction of non¬ 
linear inequalities, describing all coefficients that make the supplied template 
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an inductive loop invariant. A concrete quantitative invariant has to be derived 
manually by extracting solutions from the formula. 

For our prototype, the input is a quantitative Hoare triple and there are 
three possible outputs: “unknown” (due to timeout or invalid counterexamples), 
“disproved” with a witness (a valuation of program variables), and “proved” 
with a proof (a quantitative loop invariant). For Prinsys, it receives a program 
and a template, and outputs a constraint describing all inductive loop invariants 
in form of the template. To verify a specific Hoare triple with Prinsys, one 
has to encode the interested pre- and post-expectations as well as the form of 
possible invariants into the same template. Designing a template for Prinsys 
is a tricky task that needs to be done on a case-by-case basis. In contrast, our 
technique does not require manually supplied templates, though the degree of 
loop invariants has to be fixed a priori. 

One could use templates to represent non-linear loop invariants. We neverthe¬ 
less failed to verify any of our non-linear examples with Prinsys. In particular, 
we could not generate formulae that subsume the quantitative loop invariants 
computed by our prototype. This however does not imply that our examples are 
beyond the capability of Prinsys, since we could not arguably try all templates 
manually. The designers of Prinsys also examined their tool on some non-linear 
examples, e.g., the gambler’s ruin problem, and reported negative results in [15]. 
Generally, when the supplied template is non-linear, it becomes intractable to 
derive a loop invariant, or even to decide the existence of a loop invariant, from 
the formula yielded by Prinsys. Maybe a counterexample-refinement approach 
is helpful here, but this requires further research and experiments. 

9 Conclusion 

We propose an automated technique to generate polynomial quantitative invari¬ 
ants for probabilistic programs by Lagrange interpolation. Fixing the degree of 
loop invariants, our technique can infer polynomial quantitative loop invariants 
for simple loops. By choosing sampling points carefully, constraints are simpli¬ 
fied so that coefficients of loop invariants can be determined. We also develop 
a counterexample-guided refining heuristics to find coefficients of quantitative 
loop invariants. We report applications in several case studies. 

Our technique does not yet support parameters such as probability in proba¬ 
bilistic choice commands. Such parameters would induce non-linear constraints 
over coefficients and parameters. SMT solvers however could not find candidate 
coefficients and parameters as easily. Also, non-determinism is not implemented 
in our prototype. We plan to address both issues in our future work. 
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